nonverbal behavior
React to This (RTT): A Nonverbal Turing Test for Embodied AI
Zhang, Chuxuan, Etesam, Yasaman, Lim, Angelica
We propose an approach to test embodied AI agents for interaction awareness and believability, particularly in scenarios where humans push them to their limits. Turing introduced the Imitation Game as a way to explore the question: "Can machines think?" The Total Turing Test later expanded this concept beyond purely verbal communication, incorporating perceptual and physical interaction. Building on this, we propose a new guiding question: "Can machines react?" and introduce the React to This (RTT) test for nonverbal behaviors, presenting results from an initial experiment. In 1950, Turing [1] proposed the "imitation game" as a way to address the question: "Can machines think?" Since then, numerous attempts have been made to pass this test [2]. One of the earliest systems to highlight how surface-level language mimicry could deceive users was ELIZA [3], developed in 1965.
- North America > United States > California > Santa Clara County > Stanford (0.04)
- North America > Canada > British Columbia > Metro Vancouver Regional District > Burnaby (0.04)
- Europe > Netherlands > Gelderland > Nijmegen (0.04)
- Europe > France (0.04)
Human-like Nonverbal Behavior with MetaHumans in Real-World Interaction Studies: An Architecture Using Generative Methods and Motion Capture
Chojnowski, Oliver, Eberhard, Alexander, Schiffmann, Michael, Müller, Ana, Richert, Anja
Socially interactive agents are gaining prominence in domains like healthcare, education, and service contexts, particularly virtual agents due to their inherent scalability. To facilitate authentic interactions, these systems require verbal and nonverbal communication through e.g., facial expressions and gestures. While natural language processing technologies have rapidly advanced, incorporating human-like nonverbal behavior into real-world interaction contexts is crucial for enhancing the success of communication, yet this area remains underexplored. One barrier is creating autonomous systems with sophisticated conversational abilities that integrate human-like nonverbal behavior. This paper presents a distributed architecture using Epic Games MetaHuman, combined with advanced conversational AI and camera-based user management, that supports methods like motion capture, handcrafted animation, and generative approaches for nonverbal behavior. We share insights into a system architecture designed to investigate nonverbal behavior in socially interactive agents, deployed in a three-week field study in the Deutsches Museum Bonn, showcasing its potential in realistic nonverbal behavior research.
- Europe > Germany > North Rhine-Westphalia > Cologne Region > Cologne (0.05)
- North America > United States > New York > New York County > New York City (0.05)
- North America > United States > Texas > Brazos County > College Station (0.04)
- (3 more...)
EVOLVE: Emotion and Visual Output Learning via LLM Evaluation
Sinclair, Jordan, Reardon, Christopher
Additionally, this kind of subdivided action While the ability to effectively communicate and retain schema can be used to evaluate many attributes towards user attention for longer periods of time is important in many promoting empathetic responses, including tone of voice, HRI settings, eliciting an impression of empathy through nonverbal cues, and facial expressions [6]. However, atomic nonverbal behavior can be critical to acceptance of and trust actions with limited sentiments might not be sufficient to in social robots [1]. Through a comprehensive survey over accommodate complex emotion in the user. This work investigates several LLM-based actions, [2] discovered that social robots the possibility of a more open-ended response elicited higher expectations for more nuanced nonverbal cues selection by leveraging an LLM's internal domain knowledge including a breadth of behavior types. Conveying affects that of emojis and other affective imagery capable of representing are aligned with the user's emotional state can be critical emotional states. We also employ recent advances in visionlanguage in building trust around experienced empathy and personalization models with an image or camera input, as suggested from a social robot [3]. Multi-modal feedback have in [2] and [4]. Additionally, we evaluate both motion and profound impacts on successful empathetic interaction, as color [7] pattern elicitation through atomic action selection notions inferred from robot actions can be understood much [5], [6]. We selected these decision categories based on a easier with systematic actions taken in alignment with an theoretical robot design that could contain an LED strip emotional response [2], [4].
- North America > United States > New York > New York County > New York City (0.05)
- Asia > Middle East > Jordan (0.05)
Recognizing Emotion Regulation Strategies from Human Behavior with Large Language Models
Müller, Philipp, Heimerl, Alexander, Hossain, Sayed Muddashir, Siegel, Lea, Alexandersson, Jan, Gebhard, Patrick, André, Elisabeth, Schneeberger, Tanja
Human emotions are often not expressed directly, but regulated according to internal processes and social display rules. For affective computing systems, an understanding of how users regulate their emotions can be highly useful, for example to provide feedback in job interview training, or in psychotherapeutic scenarios. However, at present no method to automatically classify different emotion regulation strategies in a cross-user scenario exists. At the same time, recent studies showed that instruction-tuned Large Language Models (LLMs) can reach impressive performance across a variety of affect recognition tasks such as categorical emotion recognition or sentiment analysis. While these results are promising, it remains unclear to what extent the representational power of LLMs can be utilized in the more subtle task of classifying users' internal emotion regulation strategy. To close this gap, we make use of the recently introduced \textsc{Deep} corpus for modeling the social display of the emotion shame, where each point in time is annotated with one of seven different emotion regulation classes. We fine-tune Llama2-7B as well as the recently introduced Gemma model using Low-rank Optimization on prompts generated from different sources of information on the \textsc{Deep} corpus. These include verbal and nonverbal behavior, person factors, as well as the results of an in-depth interview after the interaction. Our results show, that a fine-tuned Llama2-7B LLM is able to classify the utilized emotion regulation strategy with high accuracy (0.84) without needing access to data from post-interaction interviews. This represents a significant improvement over previous approaches based on Bayesian Networks and highlights the importance of modeling verbal behavior in emotion regulation.
- Europe > Germany > Saarland > Saarbrücken (0.05)
- North America > United States > Tennessee > Davidson County > Nashville (0.04)
- Europe > Portugal > Lisbon > Lisbon (0.04)
- (3 more...)
- Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
- Information Technology > Artificial Intelligence > Cognitive Science > Emotion (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.35)
Nonverbal Interaction Detection
Wei, Jianan, Zhou, Tianfei, Yang, Yi, Wang, Wenguan
This work addresses a new challenge of understanding human nonverbal interaction in social contexts. Nonverbal signals pervade virtually every communicative act. Our gestures, facial expressions, postures, gaze, even physical appearance all convey messages, without anything being said. Despite their critical role in social life, nonverbal signals receive very limited attention as compared to the linguistic counterparts, and existing solutions typically examine nonverbal cues in isolation. Our study marks the first systematic effort to enhance the interpretation of multifaceted nonverbal signals. First, we contribute a novel large-scale dataset, called NVI, which is meticulously annotated to include bounding boxes for humans and corresponding social groups, along with 22 atomic-level nonverbal behaviors under five broad interaction types. Second, we establish a new task NVI-DET for nonverbal interaction detection, which is formalized as identifying triplets in the form
- North America > United States > Nebraska > Lancaster County > Lincoln (0.04)
- Asia > China > Beijing > Beijing (0.04)
- Information Technology > Artificial Intelligence > Vision (1.00)
- Information Technology > Artificial Intelligence > Natural Language (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.68)
- Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis (0.46)
When to generate hedges in peer-tutoring interactions
Abulimiti, Alafate, Clavel, Chloé, Cassell, Justine
This paper explores the application of machine learning techniques to predict where hedging occurs in peer-tutoring interactions. The study uses a naturalistic face-to-face dataset annotated for natural language turns, conversational strategies, tutoring strategies, and nonverbal behaviours. These elements are processed into a vector representation of the previous turns, which serves as input to several machine learning models. Results show that embedding layers, that capture the semantic information of the previous turns, significantly improves the model's performance. Additionally, the study provides insights into the importance of various features, such as interpersonal rapport and nonverbal behaviours, in predicting hedges by using Shapley values for feature explanation. We discover that the eye gaze of both the tutor and the tutee has a significant impact on hedge prediction. We further validate this observation through a follow-up ablation study.
- Oceania > Australia > Victoria > Melbourne (0.04)
- North America > United States > Pennsylvania > Philadelphia County > Philadelphia (0.04)
- North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.04)
- (5 more...)
Admoni
The field of socially assistive robotics (SAR) aims to build robots that help people through social interaction. Human social interaction involves complex systems of behavior, and modeling these systems is one goal of SAR. Nonverbal behaviors, such as eye gaze and gesture, are particularly amenable to modeling through machine learning because the effects of the system--the nonverbal behaviors themselves--are inherently observable. Uncovering the underlying model that defines those behaviors would allow socially assistive robots to become better interaction partners. Our research investigates how people use nonverbal behaviors in tutoring applications. We use data from human-human interactions to build a model of nonverbal behaviors using supervised machine learning. This model can both predict the context of observed behaviors and generate appropriate nonverbal behaviors.
Admoni
In typical human interactions, nonverbal behaviors such as eye gazes and gestures serve to augment and reinforce spoken communication. To use similar nonverbal behaviors in human-robot interactions, researchers can apply artificial intelligence techniques such as machine learning, cognitive modeling, and computer vision. But knowledge of nonverbal behavior can also benefit artificial intelligence: because nonverbal communication can reveal human mental states, these behaviors provide additional input to artificial intelligence problems such as learning from demonstration, natural language processing, and motion planning. This article describes how nonverbal communication in HRI can benefit from AI techniques as well as how AI problems can use nonverbal communication in their solutions.
Let's be friends! A rapport-building 3D embodied conversational agent for the Human Support Robot
Pasternak, Katarzyna, Wu, Zishi, Visser, Ubbo, Lisetti, Christine
Partial subtle mirroring of nonverbal behaviors during conversations (also known as mimicking or parallel empathy), is essential for rapport building, which in turn is essential for optimal human-human communication outcomes. Mirroring has been studied in interactions between robots and humans, and in interactions between Embodied Conversational Agents (ECAs) and humans. However, very few studies examine interactions between humans and ECAs that are integrated with robots, and none of them examine the effect of mirroring nonverbal behaviors in such interactions. Our research question is whether integrating an ECA able to mirror its interlocutor's facial expressions and head movements (continuously or intermittently) with a human-service robot will improve the user's experience with the support robot that is able to perform useful mobile manipulative tasks (e.g. at home). Our contribution is the complex integration of an expressive ECA, able to track its interlocutor's face, and to mirror his/her facial expressions and head movements in real time, integrated with a human support robot such that the robot and the agent are fully aware of each others', and of the users', nonverbals cues. We also describe a pilot study we conducted towards answering our research question, which shows promising results for our forthcoming larger user study.
- North America > United States > Florida > Miami-Dade County > Miami (0.14)
- North America > United States > New York > New York County > New York City (0.04)
- North America > United States > Florida > Miami-Dade County > Coral Gables (0.04)
- (2 more...)
Let's Face It: Probabilistic Multi-modal Interlocutor-aware Generation of Facial Gestures in Dyadic Settings
Jonell, Patrik, Kucherenko, Taras, Henter, Gustav Eje, Beskow, Jonas
To enable more natural face-to-face interactions, conversational agents need to adapt their behavior to their interlocutors. One key aspect of this is generation of appropriate non-verbal behavior for the agent, for example facial gestures, here defined as facial expressions and head movements. Most existing gesture-generating systems do not utilize multi-modal cues from the interlocutor when synthesizing non-verbal behavior. Those that do, typically use deterministic methods that risk producing repetitive and non-vivid motions. In this paper, we introduce a probabilistic method to synthesize interlocutor-aware facial gestures - represented by highly expressive FLAME parameters - in dyadic conversations. Our contributions are: a) a method for feature extraction from multi-party video and speech recordings, resulting in a representation that allows for independent control and manipulation of expression and speech articulation in a 3D avatar; b) an extension to MoGlow, a recent motion-synthesis method based on normalizing flows, to also take multi-modal signals from the interlocutor as input and subsequently output interlocutor-aware facial gestures; and c) a subjective evaluation assessing the use and relative importance of the input modalities. The results show that the model successfully leverages the input from the interlocutor to generate more appropriate behavior. Videos, data, and code available at: https://jonepatr.github.io/lets_face_it.
- Europe > United Kingdom > Scotland > City of Glasgow > Glasgow (0.05)
- North America > United States > New York > New York County > New York City (0.04)
- Research Report > Experimental Study (0.67)
- Research Report > New Finding (0.66)